-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Vulkan: improve mul_mat_vec_iq1_m #16907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| return; | ||
|
|
||
| // Number of rows to process for this workgroup | ||
| const uint rows_to_process = min(NUM_ROWS, p.stride_d - first_row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty surprised if it helped to make the changes in this function - this will prevent the compiler from unrolling loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a difference from adding this, so I would prefer to keep it as it was.
@lovedheart Can you benchmark if the changes in the main function make a difference for you?
|
I don't see much of a difference either way. Maybe slight improvement on RDNA3 for n=1, maybe slightly negative on GCN, Nvidia and Intel. Hard to tell, it's close to run-to-run variance. AMD RX 8060SAMD Radeon Pro VIIIntel A770Nvidia RTX 3090 |
|
The code seems to fix the performance only on Windows. In Linux, I cannot see the improvement. In comparison with ROCm, it produced |
0cc4m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it's fine, but please clean up the purely cosmetical code changes and check whether the main function changes are necessary.
| [[unroll]] for (uint i = 0; i < NUM_ROWS; ++i) | ||
| temp[j][i] = FLOAT_TYPE(0); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the above changes in this function are just code style, please revert them. It's okay to improve readability and style of code you're touching anyways, but that doesn't apply here. I also prefer to keep curly brackets after loops or ifs.
| FLOAT_TYPE temp[NUM_COLS][NUM_ROWS]; | ||
|
|
||
| void calc_superblock(const uint a_offset, const uint b_offset, const uint ib32, const uint i, const uint num_blocks_per_row, const uint first_row, const uint num_rows) { | ||
| // ------------------ calc_superblock (final optimized version) ------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment isn't necessary.
| return; | ||
|
|
||
| // Number of rows to process for this workgroup | ||
| const uint rows_to_process = min(NUM_ROWS, p.stride_d - first_row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a difference from adding this, so I would prefer to keep it as it was.
@lovedheart Can you benchmark if the changes in the main function make a difference for you?
./build/bin/Release/test-backend-ops.exe perf -o MUL_MAT -p type_a=iq1_m
Tested on AMD 8845HS 780M iGPU